AITopics | class noise

Collaborating Authors

class noise

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Approximate Borderline Sampling using Granular-Ball for Classification Tasks

Xie, Qin, Zhang, Qinghua, Xia, Shuyin

arXiv.org Artificial IntelligenceJun-4-2025

Chongqing Key Laboratory of Computational Intelligence Chongqing University of Posts and T elecommunications Chongqing, China d210201029@stu.cqupt.edu.cn Chongqing Key Laboratory of Computational Intelligence Chongqing University of Posts and T elecommunications Chongqing, China zhangqh@cqupt.edu.cn Chongqing Key Laboratory of Computational Intelligence Chongqing University of Posts and T elecommunications Chongqing, China xiasy@cqupt.edu.cn Abstract --Data sampling enhances classifier efficiency and robustness through data compression and quality improvement. Recently, the sampling method based on granular-ball (GB) has shown promising performance in generality and noisy classification tasks. However, some limitations remain, including the absence of borderline sampling strategies and issues with class boundary blurring or shrinking due to overlap between GBs. In this paper, an approximate borderline sampling method using GBs is proposed for classification tasks. First, a restricted diffusion-based GB generation (RD-GBG) method is proposed, which prevents GB overlaps by constrained expansion, preserving precise geometric representation of GBs via redefined ones. Second, based on the concept of heterogeneous nearest neighbor, a GB-based approximate borderline sampling (GBABS) method is proposed, which is the first general sampling method capable of both borderline sampling and improving the quality of class noise datasets. Additionally, since RD-GBG incorporates noise detection and GBABS focuses on borderline samples, GBABS performs outstandingly on class noise datasets without the need for an optimal purity threshold. Experimental results demonstrate that the proposed methods outperform the GB-based sampling method and several representative sampling methods. Data sampling plays a pivotal role in supervised machine learning, particularly for classification tasks. It offers a multitude of benefits, including reduced computational complexity, balanced class distributions, diminished effects of noise and outliers, alleviation of overfitting, and enhanced model inter-pretability.

artificial intelligence, fuzzy logic, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2506.02366

Country: Asia > China > Chongqing Province > Chongqing (1.00)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.68)

Add feedback

Benchmarking Label Noise in Instance Segmentation: Spatial Noise Matters

Grad, Eden, Kimhi, Moshe, Halika, Lion, Baskin, Chaim

arXiv.org Artificial IntelligenceJun-18-2024

Obtaining accurate labels for instance segmentation is particularly challenging due to the complex nature of the task. Each image necessitates multiple annotations, encompassing not only the object's class but also its precise spatial boundaries. These requirements elevate the likelihood of errors and inconsistencies in both manual and automated annotation processes. By simulating different noise conditions, we provide a realistic scenario for assessing the robustness and generalization capabilities of instance segmentation models in different segmentation tasks, introducing COCO-N and Cityscapes-N. We also propose a benchmark for weakly annotation noise, dubbed COCO-WAN, which utilizes foundation models and weak annotations to simulate semi-automated annotation tools and their noisy labels. This study sheds light on the quality of segmentation masks produced by various models and challenges the efficacy of popular methods designed to address learning with label noise.

benchmark, label noise, noise, (16 more...)

arXiv.org Artificial Intelligence

2406.10891

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback

A Pitfall of Learning from User-generated Data: In-depth Analysis of Subjective Class Problem

Nemoto, Kei, Jain, Shweta

arXiv.org Machine LearningMar-23-2020

Research in the supervised learning algorithms field implicitly assumes that training data is labeled by domain experts or at least semi-professional labelers accessible through crowdsourcing services like Amazon Mechanical Turk. With the advent of the Internet, data has become abundant and a large number of machine learning based systems started being trained with user-generated data, using categorical data as true labels. However, little work has been done in the area of supervised learning with user-defined labels where users are not necessarily experts and might be motivated to provide incorrect labels in order to improve their own utility from the system. In this article, we propose two types of classes in user-defined labels: subjective class and objective class - showing that the objective classes are as reliable as if they were provided by domain experts, whereas the subjective classes are subject to bias and manipulation by the user. We define this as a subjective class issue and provide a framework for detecting subjective labels in a dataset without querying oracle. Using this framework, data mining practitioners can detect a subjective class at an early stage of their projects, and avoid wasting their precious time and resources by dealing with subjective class problem with traditional machine learning techniques.

artificial intelligence, machine learning, social media, (15 more...)

arXiv.org Machine Learning

2003.10621

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.94)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

AI & Data: Avoiding The Gotchas

#artificialintelligenceFeb-12-2019, 19:21:28 GMT

When it comes to an AI (Artificial Intelligence) project, there is usually lots of excitement. The focus is often on using new-fangled algorithms – such as deep learning neural networks – to unlock insights that will transform the business. But in this process, something often gets lost: The importance of establishing the right plan for the data. Keep in mind that 80% of the time of an AI project can be spent on identifying, storing, processing and cleansing data. "The big gotcha is having bad data fed into your AI systems," said David Linthicum, who is the Chief Cloud Strategy Officer at Deloitte Consulting LLP.

ai system, artificial intelligence, machine learning, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.56)

Add feedback

Noisy Data in Data Mining Soft Computing and Intelligent Information Systems

#artificialintelligenceNov-8-2016, 05:10:20 GMT

This Website contains a short introduction to Noisy Data together with the more relevant bibliography and it also contains the complementary material to the SCI2S research group papers on Noisy Data in Data Mining.

data mining, data quality, machine learning, (22 more...)

#artificialintelligence

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.93)
(2 more...)

Add feedback

Modelling Class Noise with Symmetric and Asymmetric Distributions

Du, Jun (China University of Geosciences) | Cai, Zhihua (China University of Geosciences)

AAAI ConferencesMar-6-2015

In classification problem, we assume that the samples around the class boundary are more likely to be incorrectly annotated than others, and propose boundary-conditional class noise (BCN). Based on the BCN assumption, we use unnormalized Gaussian and Laplace distributions to directly model how class noise is generated, in symmetric and asymmetric cases. In addition, we demonstrate that Logistic regression and Probit regression can also be reinterpreted from this class noise perspective, and compare them with the proposed models. The empirical study shows that, the proposed asymmetric models overall outperform the benchmark linear models, and the asymmetric Laplace-noise model achieves the best performance among all.

boundary, class noise, noise, (15 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: Asia > China > Hubei Province > Wuhan (0.04)

Genre:

Research Report > Experimental Study (0.37)
Research Report > New Finding (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Robustness of Threshold-Based Feature Rankers with Data Sampling on Noisy and Imbalanced Data

Shanab, Ahmad Abu (Florida Atlantic University) | Khoshgoftaar, Taghi M. (Florida Atlantic University) | Wald, Randall (Florida Atlantic University)

AAAI ConferencesMay-20-2012

Gene selection has become a vital component in the learning process when using high-dimensional gene expression data. Although extensive research has been done towards evaluating the performance of classifiers trained with the selected features, the stability of feature ranking techniques has received relatively little study. This work evaluates the robustness of eleven threshold-based feature selection techniques, examining the impact of data sampling and class noise on the stability of feature selection. To assess the robustness of feature selection techniques, we use four groups of gene expression datasets, employ eleven threshold-based feature rankers, and generate artificial class noise to better simulate real-world datasets. The results demonstrate that although no ranker consistently outperforms the others, MI and Dev show the best stability on average, while GI and PR show the least stability on average. Results also show that trying to balance datasets through data sampling has on average no positive impact on the stability of feature ranking techniques applied to those datasets. In addition, increased feature subset sizes improve stability, but only does so reliably for noisy datasets.

dataset, noise, stability, (13 more...)

AAAI Conferences

Twenty-Fifth International FLAIRS Conference

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Hudson County > Secaucus (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.94)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Inducing Interpretable Voting Classifiers without Trading Accuracy for Simplicity: Theoretical Results, Approximation Algorithms

Nock, R.

arXiv.org Artificial IntelligenceJun-9-2011

Recent advances in the study of voting classification algorithms have brought empirical and theoretical results clearly showing the discrimination power of ensemble classifiers. It has been previously argued that the search of this classification power in the design of the algorithms has marginalized the need to obtain interpretable classifiers. Therefore, the question of whether one might have to dispense with interpretability in order to keep classification strength is being raised in a growing number of machine learning or data mining papers. The purpose of this paper is to study both theoretically and empirically the problem. First, we provide numerous results giving insight into the hardness of the simplicity-accuracy tradeoff for voting classifiers. Then we provide an efficient "top-down and prune" induction heuristic, WIDC, mainly derived from recent results on the weak learning and boosting frameworks. It is to our knowledge the first attempt to build a voting classifier as a base formula using the weak learning framework (the one which was previously highly successful for decision tree induction), and not the strong learning framework (as usual for such classifiers with boosting-like approaches). While it uses a well-known induction scheme previously successful in other classes of concept representations, thus making it easy to implement and compare, WIDC also relies on recent or new results we give about particular cases of boosting known as partition boosting and ranking loss boosting. Experimental results on thirty-one domains, most of which readily available, tend to display the ability of WIDC to produce small, accurate, and interpretable decision committees.

artificial intelligence, inducing interpretable voting classifier, machine learning, (11 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.986

1106.1818

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)

Add feedback

Robustness of Filter-Based Feature Ranking: A Case Study

Altidor, Wilker (Florida Atlantic University) | Khoshgoftaar, Taghi M. (Florida Atlantic University) | Hulse, Jason Van (Florida Atlantic University)

AAAI ConferencesMay-18-2011

The filter model of feature selection has been well studied. In previous studies, classification performance has traditionally been proposed as a way to evaluate filter solutions. In this study, a new method of comparing feature ranking techniques is presented providing a straightforward approach for quantifying individual filters’ robustness to class noise. Six commonly-used filters, plus one which is rarely used, are investigated regarding their ability to retain, in the presence of class noise, strong classification performance. Three classifiers and one classification performance metric are considered. The experimental results of this study show that Gain Ratio, one of the well known and widely used filters, is very sensitive to class noise. ReliefF offers the best results with both the NB and kNN learners while Signal-to-noise, though not as widely used in the literature as the others, outperforms all the filters with the SVM learner.

class noise, classification performance, noise, (14 more...)

AAAI Conferences

Twenty-Fourth International FLAIRS Conference

Country:

South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Add feedback

Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework

Zhou, Hongbo, Cheng, Qiang

Neural Information Processing SystemsDec-31-2010

Regularization technique has become a principle tool for statistics and machine learning research and practice. However, in most situations, these regularization terms are not well interpreted, especially on how they are related to the loss function and data. In this paper, we propose a robust minimax framework to interpret the relationship between data and regularization terms for a large class of loss functions. We show that various regularization terms are essentially corresponding to different distortions to the original data matrix. This minimax framework includes ridge regression, lasso, elastic net, fused lasso, group lasso, local coordinate coding, multiple kernel learning, etc., as special cases. Within this minimax framework, we further gave mathematically exact definition for a novel representation called sparse grouping representation (SGR), and proved sufficient conditions for generating such group level sparsity. Under these sufficient conditions, a large set of consistent regularization terms can be designed. This SGR is essentially different from group lasso in the way of using class or group information, and it outperforms group lasso when there appears group label noise. We also gave out some generalization bounds in a classification setting.

artificial intelligence, lasso, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Add feedback